-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a planar digitisation algorithm #1
Conversation
This PR has been updated and it's ready for comments |
std::string m_collName; | ||
|
||
// inline static thread_local std::mt19937 m_engine; | ||
inline static thread_local std::mt19937 m_engine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 32 bit version of the MT enough here? Or do we need the mt19937_64
64 bit version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't be enough? It's >10^9 values for just drawing from a normal
auto seed = m_uidSvc->getUniqueID(headers[0].getEventNumber(), headers[0].getRunNumber(), this->name()); | ||
info() << "Using seed " << seed << " for event " << headers[0].getEventNumber() << " and run " | ||
<< headers[0].getRunNumber() << endmsg; | ||
m_engine.seed(seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the size of the seed
in this case? The mersenne twister needs a comparatively large seed to properly initialize it's internal state. See, e.g. https://codereview.stackexchange.com/a/109266
I am not entirely sure that we have to do this for all events, but maybe we should initialize it properly (and reproducibly) once in initialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seed is uint64_t
. But, as long as the seeds for every event are different then it shouldn't matter which seeds are being used . In the link I only see that if you have 2^32 seeds you have at most 2^32 choices for the first generated number so of course you can find them. For all events or not is a different discussion. Even when compiling without optimizations seeding a million times takes a few seconds so I don't think it's much of a problem. The idea of the UniqueIDGenSvc
was certainly to do this for each event and in the original processor it was being done for each event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with an improperly seeded MT (as far as I understand it), is that it will not be possible to reach all internal states (and thus random outputs?). So the actual amount of unique random numbers that you get might be much smaller than what is theoretically possible. I am mainly raising this because the Marlin processor used the GSL random library, specifically gsl_rng_ranlxs2
, which seems to be "more random" than the MT. I have://www.gnu.org/software/gsl/doc/html/rng.html#c.gsl_rng_ranlxs2), which seems to be "more random" than the MT. I am not sure any of this matters in the end, but it might be the source of somewhat subtle bugs since the random engine changes from Marlin to Gaudi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmcarcell @tmadlener choosing a random engine is not easy... Geant4 default random engine is the so-called MinMax, described here, where they also comment about the MT the following:
In this sense, the underlying dynamical system of the Mersenne Twister has one order of magnitude less entropy than RCARRY. This is related to the singular flaw acknowledged by the authors of the Mersenne Twister, which is that MT has a very long recovery time when it is seeded by a vector containing mostly zeros: the output is observed to be non-random even after outputting a million values. In this instance, the divergence of some trajectory is observed away from the origin. In fact, from the dynamical system point of view this is not merely a manifestation of an unlucky initialization, since any two nearby trajectories of the MT diverge very slowly and this is ultimately what causes the failure of MT in the statistical tests. The total entropy of the MT system is approximately h ≈ 4.8. On the basis of the known behaviour of RCARRY [10,11] and our investigation of the MIXMAX, it appears that an entropy of at least h & 50 is required for a generator to have a sufficiently random trajectory. In light of this, it is perhaps not surprising that the Mersenne Twister fails many tests in its pure form, and still fails some tests even when some additional tempering of the output is applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmadlener why not using the ROOT wrapper for the GSL random engine? Edit, I just found a bug that prevents its use at the moment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmadlener why not using the ROOT wrapper for the GSL random engine? Edit, I just found a bug that prevents its use at the moment...
Most likely because this is a port from the corresponding Marlin processor that did not have a ROOT dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oki, fair enough. In case you want to avoid depending on GSL random engines (because of software license), boost offers the same engines, link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using now TRandom2
. I will merge this shortly since there hasn't been much discussion other than the random number generator, and the validation should already be done, although I will try to run again.
BEGINRELEASENOTES
ENDRELEASENOTES
Builds on top of key4hep/k4FWCore#173
Needs Gaudi v38 for using the RootHistograms from Gaudi
Validation has been done with a file with 25 events (Z->qq). On the Marlin side the Overlay has been disabled to make sure the input collections are not modified. Then, I ran the python file in this PR over that file. The parameters
are taken from https://github.com/iLCSoft/CLICPerformance/blob/master/clicConfig/clicReconstruction.xml.
So far everything I have checked looks good, these are the plots that both the processor and this algorithm make that I haven't changed (except the binning for diffu and diffv), the order is first the plot from the processor and then the plot from this algorithm.
This is the change in the local coordinate u for the accepted hits; the resolution is set to 0.003 mm which is what we get in the resolution.
Same as above but for v
This is the energy of the hit, should be identical in both cases. Note that there are more entries since this plot is done for every hit, including the ones that are not accepted.
This is the percentage of hits accepted for each event
This is the smearing in u but divided by the resolution; i.e. normalized to have sigma 1
Same as above but with v
Note that the number of accepted hits is the same in both cases but this isn't necessary since the random numbers generated for the smearing are different in each case.